week 9: advanced methods

measurement

library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes)
library(ggdag)
library(ggrepel) # for putting labels on figures

measurement error

Measurement is the principled assignment of numbers to qualitities. And no matter what you measure or how you measure it, you’ll have some error.

Some tools tend to produce very little error (e.g., the length of this table in in inches). Other tools tend to produce more error. In the social sciences, our job is made harder by the fact that we often measure qualities that do not have an objective physical reality, like happiness or well-being. Measurement error is exacerbated when there is little available data.

Unfortunately, our statistic models will assume your measure has no error… unless you can tell the model how much error there is.

In measurement theory, we may assume that

\[ X_i = T_i + \epsilon_i \]

Meaning that for any observation \(i\), the observed score \(X\) on some measure is the sum of the true score \(T\) and error \(\epsilon\). Classical test theory assumes that \(\epsilon_i\) is randomly distributed, but other theories (IRT) disagree. Regardless, we can move forward with the assumption that observed scores are caused by some true score and some error.

marriage example

data(WaffleDivorce, package="rethinking")
d <- WaffleDivorce

rethinking::precis(d) 
                          mean           sd     5.5%        94.5%
Location                   NaN           NA       NA           NA
Loc                        NaN           NA       NA           NA
Population        6.119600e+00 6.876156e+00  0.65780 1.897690e+01
MedianAgeMarriage 2.605400e+01 1.243630e+00 24.26950 2.826100e+01
Marriage          2.011400e+01 3.797905e+00 15.20850 2.649150e+01
Marriage.SE       1.399400e+00 7.969749e-01  0.54950 2.902200e+00
Divorce           9.688000e+00 1.820814e+00  6.66950 1.273050e+01
Divorce.SE        9.618000e-01 5.253675e-01  0.34085 1.893050e+00
WaffleHouses      3.234000e+01 6.578959e+01  0.00000 1.357450e+02
South             2.800000e-01 4.535574e-01  0.00000 1.000000e+00
Slaves1860        7.937834e+04 1.497309e+05  0.00000 4.355531e+05
Population1860    6.287293e+05 7.813127e+05  0.00000 1.903357e+06
PropSlaves1860    9.405132e-02 1.744486e-01  0.00000 4.561000e-01
                       histogram
Location                        
Loc                             
Population              ▇▃▁▁▁▁▁▁
MedianAgeMarriage ▁▁▂▂▃▇▅▃▁▁▂▁▁▁
Marriage              ▁▃▇▇▇▅▂▁▁▁
Marriage.SE             ▁▇▅▃▁▂▁▁
Divorce                 ▂▃▅▅▇▂▃▁
Divorce.SE          ▂▇▇▃▃▂▁▂▂▁▁▁
WaffleHouses            ▇▁▁▁▁▁▁▁
South                 ▇▁▁▁▁▁▁▁▁▂
Slaves1860            ▇▁▁▁▁▁▁▁▁▁
Population1860          ▇▃▂▁▁▁▁▁
PropSlaves1860      ▇▁▁▁▁▁▁▁▁▁▁▁
Code
dag_coords <-
  tibble(name = c("A", "M", "D", "Dobs", "eD"),
         x    = c(1, 2, 2, 3, 4),
         y    = c(2, 3, 1, 1, 1))

dagify(M    ~ A,
       D    ~ A + M,
       Dobs ~ D + eD,
       coords = dag_coords) %>%
  tidy_dagitty() %>% 
  mutate(color = ifelse(name %in% c("D", "eD"), "a", "b")) %>% 
  
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_point(aes(color = color),
                 size = 10, show.legend = F) +
  geom_dag_text(parse = T, label = c("A", "D", expression(D[obs]), 
  "M",expression(italic(e)[D]))) +
  geom_dag_edges() +
  theme_void()
Code
p1 <- d %>%
  ggplot(aes(x = MedianAgeMarriage, 
             y = Divorce,
             ymin = Divorce - Divorce.SE, 
             ymax = Divorce + Divorce.SE)) +
  geom_pointrange(shape = 20, alpha = 2/3, color="#1c5253") +
  labs(x = "Median age marriage" , 
       y = "Divorce rate")

p2 <-
  d %>%
  ggplot(aes(x = log(Population), 
             y = Divorce,
             ymin = Divorce - Divorce.SE, 
             ymax = Divorce + Divorce.SE)) +
  geom_pointrange(shape = 20, alpha = 2/3, color="#e07a5f") +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("log population")

p1 | p2

How do you incorporate the standard error into the model? First, we’ll think through this logically, then we’ll use the code.

First, measurement error refers to the amount of variability we would expect to see in statistics across studies. In other words, error is the measure of the spread of a distribution. What’s unknown in this case is the mean of the distribution.

\[ D_{\text{obs},i} \sim \text{Normal}(D_i, D_{\text{SE},i}) \]

Well hey, isn’t that the kind of parameter that Bayesian analysis has been trying to estimate all along?

mathematical model

\[\begin{align} D_{\text{obs},i} &\sim \text{Normal}(D_i, D_{\text{SE},i}) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_AA_i + \beta_MM_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 0.5) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \end{align}\]

For the sake of easier priors, we’ll standardize all our variables first. Note that standardizing standard errors means keeping them in the same unit as the varible they refer to.

d <-
  d %>% 
  mutate(D_obs = (Divorce - mean(Divorce)) / sd(Divorce),
         D_sd  = Divorce.SE / sd(Divorce),
         M     = (Marriage - mean(Marriage)) / sd(Marriage),
         A     = (MedianAgeMarriage - mean(MedianAgeMarriage)) / sd(MedianAgeMarriage),
         M_obs = M,
         M_sd  = Marriage.SE / sd(Marriage))
m1 <- 
  brm(data = d, 
      family = gaussian,
      D_obs | mi(D_sd) ~ 1 + A + M,
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 0.5), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 15,
      # note this line
      save_pars = save_pars(latent = TRUE),
      file = here("files/models/m92.1"))
m1
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: D_obs | mi(D_sd) ~ 1 + A + M 
   Data: d (Number of observations: 50) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.05      0.10    -0.24     0.14 1.00     4597     3090
A            -0.62      0.16    -0.92    -0.30 1.00     3829     3423
M             0.05      0.17    -0.27     0.38 1.00     3671     3297

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.58      0.11     0.39     0.81 1.00     1682     1720

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
posterior_summary(m1)
                Estimate  Est.Error         Q2.5        Q97.5
b_Intercept  -0.05287903 0.09560518  -0.23886745   0.13726794
b_A          -0.61700465 0.15906799  -0.92245919  -0.29941922
b_M           0.05250207 0.16596210  -0.27093761   0.38138566
sigma         0.58407876 0.10676964   0.38881281   0.81491478
Intercept    -0.05287903 0.09560518  -0.23886745   0.13726794
Yl[1]         1.16380958 0.36399705   0.45678248   1.88019251
Yl[2]         0.68798528 0.55545971  -0.40532881   1.78866913
Yl[3]         0.42646352 0.33696468  -0.22714592   1.07800780
Yl[4]         1.41990481 0.45735398   0.51982522   2.33835163
Yl[5]        -0.90062131 0.12857833  -1.15488620  -0.64688189
Yl[6]         0.65994433 0.39848512  -0.10032271   1.45295620
Yl[7]        -1.36526810 0.35504558  -2.05486652  -0.66823571
Yl[8]        -0.33840244 0.47247230  -1.25609775   0.58214114
Yl[9]        -1.90150049 0.58817751  -3.05336435  -0.72470074
Yl[10]       -0.62132190 0.17186028  -0.95587821  -0.27755085
Yl[11]        0.76341246 0.28108009   0.20629632   1.31262617
Yl[12]       -0.55322097 0.48671358  -1.53032059   0.41006406
Yl[13]        0.18014145 0.50648969  -0.85327426   1.15270187
Yl[14]       -0.87210178 0.22594215  -1.31849712  -0.42656000
Yl[15]        0.56087538 0.30848278  -0.03307509   1.16024922
Yl[16]        0.28374664 0.40123574  -0.53788724   1.07797302
Yl[17]        0.49757257 0.42622575  -0.34424766   1.32234855
Yl[18]        1.25343126 0.34850197   0.58866869   1.93645181
Yl[19]        0.43293128 0.38544033  -0.30272767   1.20778192
Yl[20]        0.40469307 0.54221317  -0.60265472   1.49375005
Yl[21]       -0.55866095 0.31505583  -1.18076309   0.06387204
Yl[22]       -1.10085094 0.26189265  -1.60983803  -0.58917295
Yl[23]       -0.26811533 0.26452671  -0.78206170   0.24950601
Yl[24]       -1.00154932 0.30528291  -1.62109423  -0.40641259
Yl[25]        0.43228569 0.39575018  -0.33219269   1.22640132
Yl[26]       -0.03375897 0.31355089  -0.66626319   0.58620335
Yl[27]       -0.01176553 0.50233364  -0.99239479   1.00288051
Yl[28]       -0.14619945 0.38981782  -0.92315210   0.61324160
Yl[29]       -0.26257719 0.50330444  -1.20212247   0.74789655
Yl[30]       -1.80061034 0.23677345  -2.25647365  -1.33209152
Yl[31]        0.17265156 0.41087733  -0.63822136   0.99652013
Yl[32]       -1.66073596 0.16260915  -1.98035225  -1.34235191
Yl[33]        0.11852901 0.24138612  -0.36122545   0.60586708
Yl[34]       -0.05418447 0.51141807  -1.11458891   0.91539123
Yl[35]       -0.12291325 0.22737764  -0.57408527   0.32393579
Yl[36]        1.28479965 0.40761437   0.48882609   2.11017466
Yl[37]        0.22368318 0.35026426  -0.46485333   0.91459730
Yl[38]       -1.02508669 0.21833912  -1.44644004  -0.59395760
Yl[39]       -0.92956225 0.54071523  -1.96249272   0.16346135
Yl[40]       -0.67944819 0.32342300  -1.31483203  -0.05196159
Yl[41]        0.24407462 0.54826453  -0.81917624   1.32536575
Yl[42]        0.74595423 0.33110341   0.09928354   1.38997465
Yl[43]        0.19636347 0.18408346  -0.16880604   0.55807729
Yl[44]        0.80183926 0.42335058  -0.05159185   1.60462754
Yl[45]       -0.40557248 0.52661056  -1.42459914   0.68427256
Yl[46]       -0.39240888 0.25684943  -0.89652352   0.09167503
Yl[47]        0.13440823 0.30195290  -0.46337386   0.72571630
Yl[48]        0.55957284 0.44795065  -0.31515399   1.44366592
Yl[49]       -0.63550849 0.27979049  -1.18884521  -0.08282839
Yl[50]        0.85003581 0.58270897  -0.33293414   1.97866945
lprior       -1.36690837 0.44820845  -2.44940868  -0.71213312
lp__        -78.15515223 6.57052200 -91.46921227 -65.53798980
Code
states <- c("AL", "AR", "ME", "NH", "RI", "DC", "VT", "AK", "SD", "UT", "ID", "ND", "WY")

d_est <-
  posterior_summary(m1) %>% 
  data.frame() %>% 
  rownames_to_column("term") %>% 
  mutate(D_est = Estimate) %>% 
  select(term, D_est) %>% 
  filter(str_detect(term, "Yl")) %>% 
  bind_cols(d)


  d_est %>%
  ggplot(aes(x = D_sd, y = D_est - D_obs)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_point(alpha = 2/3) +
  geom_text_repel(data = . %>% filter(Loc %in% states),  
                  aes(label = Loc), 
                  size = 3, seed = 15) 
Code
states <- c("AR", "ME", "RI", "ID", "WY", "ND", "MN")

as_draws_df(m1) %>% 
  expand_grid(A = seq(from = -3.5, to = 3.5, length.out = 50)) %>% 
  mutate(fitted = b_Intercept + b_A * A) %>% 
  
  ggplot(aes(x = A)) +
  stat_lineribbon(aes(y = fitted),
                  .width = .95, size = 1/3,
                  color = "grey20",
                  fill = "grey80") +
  geom_segment(data = d_est,
               aes(xend = A, y = D_obs, yend = D_est),
               linewidth = 1/5) +
  geom_point(data = d_est,
             aes(y = D_obs)) +
  geom_point(data = d_est,
             aes(y = D_est),
             shape = 1, stroke = 1/3) +
  geom_text_repel(data = d %>% filter(Loc %in% states),  
                  aes(y = D_obs, label = Loc), 
                  size = 3, seed = 15) +
  labs(x = "median age marriage (std)",
       y = "divorce rate (std)") +
  coord_cartesian(xlim = range(d$A), 
                  ylim = range(d$D_obs))

Modeling with error

- incorporating measurement error into your model results in regularization: observations associated with more error are less influential on the model, and the estimated true score of those points can be (but are not necessarily) quite different from the observation. 

- this will result in a model that is more conservative but one you can have more faith in. 

- of course, there's an obvious application of this kind of modeling...

meta analysis

The goal of a meta-analysis is estimating meaningful parameters. A set of studies that are broadly comparable are either…

1. identical replications of each other in that all studies are identical samples of the same population with the same outcome measures, etc, or 
2. so different that the results of any one study provide no information about the results of any of the others, or
3. the studies as exchangeable but not necessarily either identical or completely unrelated; in other words we allow differences from study to study, but such that the differences are not expected a priori to have predictable effects favoring one study over another.

In other words, you can answer the question with complete pooling, no pooling, or partial pooling. And generally speaking, the third of these is probably the one you want.

Our data come from the second large-scale replication project by the Many Labs team (Klein et al., 2018). Of the 28 studies replicated in the study, we will focus on the replication of the trolley experiment from Hauser et al. (2007).

From Klein and colleagues:

According to the principle of double effect, an act that harms other people is more morally permissible if the act is a foreseen side effect rather than the means to the greater good. Hauser et al. (2007) compared participants’ reactions to two scenarios to test whether their judgments followed this principle. In the foreseen-side-effect scenario, a person on an out-of-control train changed the train’s trajectory so that the train killed one person instead of five. In the greater-good scenario, a person pushed a fat man in front of a train, killing him, to save five people. Whereas 89% of participants judged the action in the foreseen-side-effect scenario as permissible (95% CI = [87%, 91%]), only 11% of participants in the greater-good scenario judged it as permissible (95% CI = [9%,13%]). The difference between the percentages was significant, \(χ2(1, N=2,646) = 1,615.96\), \(p<.001\), \(w=.78\), \(d=2.50\), 95%CI=[2.22,2.86]. Thus, the results provided evidence for the principle of double effect. (p. 459, emphasis in the original)

You can download the data from OSF. You want the .csv file in Trolley Dilemma 1 > Hauser.1 > By Order > Data > Hauser_1_study_by_order_all_CLEAN_CASE.csv

d <- read_csv(here("files/data/external_data/Hauser_1_study_by_order_all_CLEAN_CASE.csv"))
glimpse(d)
Rows: 6,842
Columns: 27
$ uID              <dbl> 65, 68, 102, 126, 145, 263, 267, 298, 309, 318, 350, …
$ variable         <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes"…
$ factor           <chr> "SideEffect", "SideEffect", "SideEffect", "SideEffect…
$ .id              <chr> "ML2_Slate1_Brazil__Portuguese_execution_illegal_r.cs…
$ source           <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ haus1.1          <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1,…
$ haus1.1t_1       <dbl> 39.054, 36.792, 56.493, 21.908, 25.635, 50.633, 58.66…
$ haus2.1          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ haus2.1t_1       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ Source.Global    <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Source.Primary   <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Source.Secondary <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Country          <chr> "Brazil", "Brazil", "Brazil", "Canada", "Canada", "Ca…
$ Location         <chr> "Social and Work Psychology Department, University of…
$ Language         <chr> "Portuguese", "Portuguese", "Portuguese", "English", …
$ Weird            <dbl> 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Execution        <chr> "illegal", "illegal", "illegal", "illegal", "illegal"…
$ SubjectPool      <chr> "No", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", …
$ Setting          <chr> "In a classroom", "In a classroom", "In a classroom",…
$ Tablet           <chr> "Computers", "Computers", "Computers", "Computers", "…
$ Pencil           <chr> "No, the whole study was on the computer (except mayb…
$ StudyOrderN      <chr> "Hauser|Ross.Slate1|Rottenstrich|Graham|Kay|Inbar|And…
$ IDiffOrderN      <chr> "ID: Global self-esteem SISE|ID: Mood|ID: Subjective …
$ study.order      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ analysis.type    <chr> "Order", "Order", "Order", "Order", "Order", "Order",…
$ subset           <chr> "all", "all", "all", "all", "all", "all", "all", "all…
$ case.include     <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
d %>% 
  count(Location) %>% 
  arrange(desc(n))
# A tibble: 59 × 2
   Location                                                                    n
   <chr>                                                                   <int>
 1 University of Toronto, Scarborough                                        325
 2 MTurk India Workers                                                       308
 3 MTurk US Workers                                                          304
 4 University of Illinois at Urbana-Champaign, Champaign, IL                 198
 5 Eotvos Lorand University, in Budapest, Hungary                            180
 6 Department of Social Psychology, Tilburg University, P.O. Box 90153, T…   173
 7 Department of Psychology, San Diego State University, San Diego, CA 92…   171
 8 Department of Psychology, Pennsylvania State University Abington, Abin…   166
 9 American University of Sharjah, United Arab Emirates                      162
10 University of British Columbia, Vancouver, Canada                         147
# ℹ 49 more rows
d %>% 
  count(Location) %>% 
  arrange(n)
# A tibble: 59 × 2
   Location                                                       n
   <chr>                                                      <int>
 1 badania.net                                                   34
 2 Open University of Hong Kong                                  36
 3 Universidade do Porto, Portugal                               36
 4 Université de Poitiers, France                                44
 5 University of Navarra, Spain                                  48
 6 Santiago, Chile                                               59
 7 University of Johannesburg, Johanneburg, South Africa         70
 8 Koç University, Istanbul, Turkey                              74
 9 Department of Psychology, Ithaca College, Ithaca, NY 14850    76
10 Tufts                                                         79
# ℹ 49 more rows

item-response theory